Lifestyle Transitions in Fusarioid Fungi are Frequent and Lack Clear Genomic Signatures

Abstract The fungal genus Fusarium (Ascomycota) includes well-known plant pathogens that are implicated in diseases worldwide, and many of which have been genome sequenced. The genus also encompasses other diverse lifestyles, including species found ubiquitously as asymptomatic-plant inhabitants (endophytes). Here, we produced structurally annotated genome assemblies for five endophytic Fusarium strains, including the first whole-genome data for Fusarium chuoi. Phylogenomic reconstruction of Fusarium and closely related genera revealed multiple and frequent lifestyle transitions, the major exception being a monophyletic clade of mutualist insect symbionts. Differential codon usage bias and increased codon optimisation separated Fusarium sensu stricto from allied genera. We performed computational prediction of candidate secreted effector proteins (CSEPs) and carbohydrate-active enzymes (CAZymes)—both likely to be involved in the host–fungal interaction—and sought evidence that their frequencies could predict lifestyle. However, phylogenetic distance described gene variance better than lifestyle did. There was no significant difference in CSEP, CAZyme, or gene repertoires between phytopathogenic and endophytic strains, although we did find some evidence that gene copy number variation may be contributing to pathogenicity. Large numbers of accessory CSEPs (i.e., present in more than one taxon but not all) and a comparatively low number of strain-specific CSEPs suggested there is a limited specialisation among plant associated Fusarium species. We also found half of the core genes to be under positive selection and identified specific CSEPs and CAZymes predicted to be positively selected on certain lineages. Our results depict fusarioid fungi as prolific generalists and highlight the difficulty in predicting pathogenic potential in the group.


Introduction
Fusarium (Hypocreales, Ascomycota) is a globally distributed genus of approximately 230 species (https://www. fusarium.org/), many of which are implicated in devastating fungal diseases of plants. For instance, throughout the first half of the 20th century, Fusarium wilt of banana single-handedly wiped out the main globally traded banana cultivar-equivalent to losses of at least US$2.3 billion in 2000 (Ploetz 2005). A new Fusarium epidemic is now affecting the current dominant banana cultivar (Ordonez et al. 2015). Moreover, on the much-cited list of the top 10 fungal plant pathogens by Dean et al. (2012), two spots belong to Fusarium species. Beyond plant pathogenicity, however, many species are also reported to exhibit an array of other fungal lifestyles (see supplementary table S1, Supplementary Material online), and Fusarium strains are also frequently isolated from inside healthy plant tissues (e.g., Parsa et al. 2016;Zakaria and Aziz 2018;Rashmi et al. 2019). Fungal inhabitants of plant tissues which cause no symptoms of disease are known as fungal endophytes, hyperdiverse microfungi that are omnipresent in plant microbiomes Hardoim et al. 2015).
There is no single role that endophytes play in the plant host, as the endophytic lifestyle represents a functional range between pathogenicity and mutualism, which has been dubbed the "endophytic continuum" (Schulz and Boyle 2005). The outcome of endophyte colonisation can be highly dependent on the context of the plant-fungal interaction, such as the status of the plant immune system and nutrient conditions (Junker et al. 2012;Lahrmann et al. 2015;Hacquard et al. 2016;Hiruma et al. 2016), as well as the presence of other endophytes within the microbiome (Redman et al. 2001;Durán et al. 2018;Mesny et al. 2021;Wolinska et al. 2021) and even light conditions (Álvarez-Loayza et al. 2011). The transient status of endophytism for many taxa is evident from observations of endophytes becoming decayers (saprotrophs) or pathogens following some change in host or abiotic conditions (Slippers and Wingfield 2007;Arnold et al. 2009;Promputtha et al. 2010;Swett and Gordon 2015;Nelson et al. 2020). In some cases, however, an evolutionary transition from pathogenicity to endophytism may represent a permanent switch to obligate commensalism or mutualism (Gazis et al. 2016), and it has also been hypothesised that endophytism may have been an ancestral "waiting room" for the evolution of mycorrhizal symbiosis (Selosse et al. 2018).
The need to categorise pathogenic potential of Fusarium taxa is obvious considering the ubiquity of Fusarium endophytes in our crops (e.g., Rubini et al., 2005;Leoni et al., 2013;Sandoval-Denis et al., 2018) and the ramifications of pathogenic Fusarium strains for food security (e.g., Menzies et al. 1990;Kokkonen et al. 2010;Okello et al. 2020). In .200 years since Fusarium was first described, the generic concept has been the source of lively debate (Summerell 2019). In recent years, many Fusarium species complexes have been reclassified into distinct "fusarioid" genera based on phenotypic and phylogenetic evidence-such as Albonectria, Bisifusarium, Cyanonectria, Geejayessia, Neocosmospora, and Rectifusarium (Schroers et al. 2011;Lombard et al. 2015;Sandoval-Denis et al. 2019)-resulting in a narrower definition of the genus, Fusarium sensu stricto. This has been opposed in some quarters, with the argument that retaining a broader definition of the genus (Fusarium sensu lato) is desirable to facilitate communication between scientists and practitioners dealing with agriculturally and clinically important species that have historically been classified under Fusarium (Geiser et al. 2021;O'Donnell et al. 2020). Crous, Lombard, et al. (2021) countered that, in light of everincreasing species discovery and recognised chemical and morphological differences between these clades, reclassification of certain species complexes into different genera is both biologically and practically meaningful. However, both sides of the debate note that ecology is similar among many of these taxa, and so questions regarding lifestyle warrant a perspective that includes allied fusarioid genera.
An evolutionary genomics approach using genomes from diverse lifestyles of fusarioid fungi could address this issue of detecting where strains fall on the pathogenicmutualistic spectrum. A phylogenomic framework could not only shed light on the timing and frequency of lifestyle transitions in the group, but also inform to what extent genetic content is shared between taxa due to ancestry versus lifestyle. In addition to comparing gene repertoires, detecting signatures of selection may also help to uncover the genetic basis of recently evolved traits. Methods based on the ratio of nonsynonymous to synonymous substitutions (dN/dS) and the phenomenon of codon usage bias -where certain codons appear more frequently than others despite encoding the same amino acid-can be used to investigate the extent of selection acting on gene content.
One genetic feature that can be particularly illuminating to compare between lifestyles is genes that encode effector proteins. Fungal effectors (known as candidate-secreted effector proteins [CSEPs] when computationally predicted) are small secreted proteins produced by fungi which mediate the plant-fungal interaction. While best-studied in the context of pathogenicity (Stergiopoulos and de Wit 2009;De Jonge et al. 2011), we now know that effectors are also essential for mutualistic or commensal fungi to form associations with plant hosts by evading the host immune response (Rafiqi et al. 2012;Plett and Martin 2015;Lo Presti et al. 2015). Effector repertoires have been shown to differentiate host-specific strains (forma specialis) in the Fusarium oxysporum species complex (FOSC) (van Dam et al. 2016), and could potentially further distinguish pathogenic and endophytic FOSC strains (Czislowski et al. 2021). Another frequently studied group of proteins involved in the plant-fungal interaction are carbohydrate-active enzymes (CAZymes), many of which act as plant cell wall-degrading enzymes (PCWDEs) (Kubicek et al. 2014). CAZymes are often referred to as saprotrophic features (Lebreton et al. 2021), but are also abundant in plant pathogens and endophytes (e.g., Zhao et al. 2013;Knapp et al. 2018;Mesny et al. 2021), and, although present in lower numbers in mycorrhizal fungi (Kohler et al. 2015;Peter et al. 2016;Miyauchi et al. 2020), certain CAZymes play key roles in the establishment and maintenance of the symbiosis (Veneault-Fourrey et al. 2014;Doré et al. 2017;Marqués-Gálvez et al. 2021). Comparing CSEP and CAZyme repertoires is therefore highly relevant to exploring genetic differences in plant associated lifestyles of fusarioid fungi.
Here, we performed whole genome sequencing, assembly, and structural annotation of five novel endophytic Fusarium strains (supplementary table S2, Supplementary Material online), including the first whole genome sequencing data and annotated assembly for the recently described species, Fusarium chuoi (Crous, Osieck, et al. 2021). Using predicted genes from these and other publicly available fusarioid strains, we produced a genome-scale phylogeny of Fusarium and allied genera with time calibration and compared CSEP and CAZyme content to answer the following questions: 1) How are lifestyles distributed across the phylogeny? 2) Can we distinguish plant pathogens and endophytes from genome sequences alone? and 3) How is selection acting on different lifestyles?

Results
Both Single-and Multi-Copy Genes Inferred the Same Backbone for Fusarium s. str.
To infer the genome-scale phylogeny of Fusarium, we used both concatenation and coalescent-based approaches, using single-copy genes with and without multi-copy genes also included (see our bioinformatics pipeline in supplementary fig. S1, Supplementary Material online). Including multi-copy genes had a greater impact on topology than tree building approach (i.e., concatenation versus coalescent) ( fig. 1A). This was seen chiefly from a change in divergence order of allied genera-Neocosmospora (= Fusarium solani species Hill et al. · https://doi.org/10.1093/molbev/msac085 MBE concept, "FSSC"), Geejayessia (= Fusarium staphyleae species concept, "FSTSC"), and Albonectria (= Fusarium decemcellulare species concept, "FDESC")-when including multi-copy genes ( fig. 1B). All methods, however, produced the same divergence order for Fusarium s. str species concepts. Disregarding differences in the naming of species, our estimations of Fusarium s. str from 1,060 loci were in broad agreement with the most recent phylogenetic analyses by Crous, Lombard, et al. (2021) and Geiser et al. (2021).
We additionally compared the impact of alignment trimming tools-trimAl (Capella-Gutiérrez et al. 2009) versus BMGE (Criscuolo and Gribaldo 2010)-on species tree topology. The RAxML-NG (Kozlov et al. 2019) species tree was identical for both trimming tools, but trimming tool impacted topology for IQ-TREE (Minh et al. 2020) and ASTRAL-III (Zhang, Rabiee, et al. 2018), with discordance in the ambrosia clade of Neocosmospora (supplementary fig. S2, Supplementary Material online). The gene trees trimmed with trimAl were selected for downstream analyses based on its reported accuracy relative to BMGE in the literature (Tan et al. 2015;Steenwyk et al. 2020). The RAxML-NG species tree was selected for downstream analyses as its topology was identical for both trimming tools while having branch length units as substitutions per site as opposed to coalescent units.

Dated Genome-Scale Phylogeny of Fusarium and Allied Genera
For divergence time estimation of the RAxML-NG species tree, we used both the independent-rates (IR) and autocorrelated-rates (AR) relaxed clock models, implemented in MCMCTree (Yang and Rannala 2006). Testing best-fit of clock models in MCMCTree (see dos Reis et al., 2018) is not possible using amino acid data, and so our assessment of divergence time estimation from the two clock models was restricted to comparisons against previous studies. The IR model generally shifted nodes towards more recent divergence times in comparison to the AR model (supplementary fig. S3 Ma). The dating of the diversification of various formae speciales in the FOSC by our IR model was also a better fit with their crop hosts having been domesticated within the last 10,000 years (Meyer et al. 2012 Nonetheless, pairwise PERMANOVA identified the insect mutualist lifestyle as the most genetically distinct, with insect mutualist taxa having significantly different gene, CSEP, and CAZyme repertoires compared with all other lifestyles other than mycoparasite. While most other lifestyles were genetically similar, endophytes and saprotrophs were also found to be significantly different in terms of CSEPs. In a similar pattern to the number of strain-specific genes, mean gene, CSEP, and CAZyme copy number were not found to be significantly different between lifestyles (supplementary fig. S4B, supplementary table S3, Supplementary Material online), but there were extreme outliers in copy number amongst plant pathogens ( fig. 2C). The greatest copy number outlier by a considerable margin was predicted to be both a CSEP and CAZyme belonging to F. oxysporum f. sp. conglutinans, annotated as a glycosyltransferase in the GT4 family: α,α-trehalose phosphorylase (configuration-retaining) (EC 2.4.1.231).

Almost Half of Core Single-Copy Genes were Under Positive Selection
While gene, CSEP, and CAZyme repertoires may have been broadly shared, we were interested in whether genes were evolving in a lifestyle-directed manner. Of the 1,054 core single-copy genes used in the selection analyses, 469 (44%) were found to be under episodic positive selection by both BUSTED  and aBSREL   (fig. 3A). This included 11 of 31 (35%) core CSEPs and 6 of 11 (55%) core CAZymes. The branch at the root of the more conservative generic concept, Fusarium s. str., was a particular "hotspot" of positive selection, with 52 core single-copy genes positively selected according to BUSTED and aBSREL (supplementary fig. S5, Lifestyle Transitions in Fusarioid Fungi · https://doi.org/10.1093/molbev/msac085 MBE Supplementary Material online). A few external branches also had a notably high number of positively selected core genes: insect mutualist N. oligoseptata; saprotrophic F. culmorum in the F. sambucinum species complex (FSAMSC); and plant pathogenic F. oxysporum f. sp. lycopersici in the FOSC. There was no significant difference in the number of positively selected genes on external branches between lifestyles according to analysis of variance (ANOVA, P = 0.7; supplementary table S3, Supplementary Material online).
Although a minority of all CSEPs (11%) could be assigned known gene names using the PHI-base database (Urban et al. 2020), two core CSEPs with signatures of selection could be classified as known genes: 5680 as glycoside hydrolase GH35 (β-galactosidase) on hemicellulose and pectin and GH51 (non-reducing end α-L-arabinofuranosidase) on cellulose, hemicellulose and pectin; carbohydrate esterase CE12 (rhamnogalacturonan acetylesterase) on pectin; and an enzyme of auxiliary activities AA3_2 (5 ′ -oxoaverantin cyclase) on lignin.
Most CSEPs and CAZymes reported as positively selected by both BUSTED and aBSREL were also found to contain sites with a higher relative selective pressure in certain lifestyles by Contrast-FEL (Kosakovsky   (fig. 3B). In most cases only one site per gene was found to have a difference in relative selective pressure. The insect mutualist lifestyle had significantly more sites per gene under higher selective pressure compared with most other lifestyles ( fig. 3B). We should emphasise that Contrast-FEL does not inform whether positive or negative selection is occurring on a branch set, only that there is a relative increase or decrease in dN/dS, and thus higher or lower selective pressure, compared with other branches. We reasoned that if a CSEP or CAZyme with higher relative selective pressure for a lifestyle was also found to be positively selected on an external lineage of that lifestyle, then it could suggest that the selective pressure is imposed by lifestyle. This was the case for 4 of the 9 core CSEPs and 1 of the 3 core CAZymes identified as positively selected on external lineages: CSEPs 6447 (F. sp. 6, endophyte); 5996 (F. culmorum, saprotroph); 6862 (N. ambrosia, insect mutualist); and 7076 (Geejayessia zealandica, plant associate); and CAZyme 7215 of lignin degrading subfamily AA3_2 (N. oligoseptata, insect mutualist).
Codon Optimization was Higher in Fusarium s. str.
As dN/dS methods are biased by the erroneous assumption that all synonymous substitutions are neutral (Hershberg and Petrov 2008;Rahman et al. 2021), we also explored whether translational selection (i.e., bias towards certain codons in more highly expressed genes) may be acting on synonymous substitutions by assessing the extent of codon optimisation (S) across fusarioid taxa (dos Reis et al. 2004). Codon optimisation of 1,054 core single-copy genes was generally high for all taxa (between 0.4 and 0.6, on a scale from −1 to 1), but it was significantly lower in insect mutualists compared with endophytes, plant pathogens and saprotrophs ( fig. 4A, supplementary table S5, Supplementary Material online). S values were found to be significantly higher in CSEPs and CAZymes than other core single-copy genes for all lifestyles (excluding mycoparasite, which could not be tested due to small sample size); furthermore, codon optimisation of CAZymes was also significantly higher than CSEPs for insect mutual-  Lifestyle Transitions in Fusarioid Fungi · https://doi.org/10.1093/molbev/msac085 MBE also encompassed greater extremes of codon optimisation than other core genes.
As high levels of codon optimisation has been linked to host generalism in fungi (Badet et al. 2017) and codon usage bias to wide habitat range in prokaryotes (Botzman and Margalit 2011), we speculated that higher codon optimiszation may be associated with lifestyle generalism, that is, taxa being capable of exhibiting multiple lifestyles. When no data correction was performed, there was a medium strength positive correlation between the number of reported lifestyles or "lifestyle range" and S values (Pearson's r = 0.3, P = 0.01), but the statistical significance of this correlation did not hold when accounting for phylogenetic relationships with phylogenetic generalised least squares (PGLS) analysis (P = 0.06) (supplementary fig. S8, Supplementary Material online).
There was significantly higher codon optimisation in species complexes belonging to Fusarium s. str. compared with allied genera (t-test, P = 6e−11; fig. 5A inset). Codon optimisation for CSEPs was shown to be strongly  5A). This was not the case for CAZymes, however, for which the fit of codon optimisation to the PCA was not significant (P = 0.2). Hierarchical clustering of taxa by normalised relative synonymous codon usage (RSCU) was also reasonably concordant with the species tree, with a Robinson-Foulds distance of 0.4 (P = 0; fig.  5B), indicating that codon usage bias, for CSEPs if not CAZymes, is likely to be influenced by shared ancestry more than lifestyle.

Discussion
In this study, we inferred a phylogeny of Fusarium and allied genera using the greatest number of loci to date, with almost all branches significantly supported ( fig. 2A). This adds to numerous recent efforts to produce high quality fungal phylogenies from genome-scale data (e.g., Steenwyk et al. 2019;Varga et al. 2019;Li et al. 2021). Trimming method and inclusion/exclusion of multi-copy genes had some impact on species tree topology ( fig. 1, supplementary fig. S2, Supplementary Material online), but the Fusarium s. str. backbone was consistent across all approaches and in general agreement with the most recently published phylogeny of the group (Crous, Osieck, et al. 2021). Discordance was concentrated in the ambrosia clade in Neocosmospora, perhaps due to the occurrence of interspecific hybridization in this lineage (Kasson et al. 2013) or horizontal gene transfer via the exchange of strains by beetles (Hulcr and Cognato 2010). The objectives of this study were not concerned with the taxonomic debates surrounding the Fusarium generic concept, but our results did show that the divergence between Fusarium s. str. and other fusarioid taxa was associated with positive selection on a considerable number of core genes ( 5B). While these results obviously do not directly contribute to characterisation of the taxa involved, they might be seen as a symptom of a "larger and more abrupt" divergence than that between species within the same genus (Booth 1978), contrary to Fusarium s. lat. (O'Donnell et al. 2020;Geiser et al. 2021).
We generally found the IR molecular clock model to produce dating estimates that were more concordant with estimates from other studies assessing divergence times of fusarioid fungi (e.g., Kasson et al., 2013;O'Donnell et al., 2013O'Donnell et al., , 2015, which was largely to be expected considering that these studies also used IR models (but different secondary calibrations). The IR model estimated the divergence of obligate insect mutualists to correspond more closely to the crown age of their insect hosts, as estimated with insect fossil calibrations (Jordal and Cognato 2012). By contrast, the AR model appeared to produce less congruent ages for recently diverged lineages, such as the highly specialised FOSC strains diverging before their host plants are likely to have existed. AR models have generally been thought appropriate for plants and animals considering the correlation between substitution rate and life-history traits (Lartillot et al. 2016), and it has furthermore been suggested that AR is the norm across all kingdoms of life (Tao et al. 2019). On the other hand, Taylor and Berbee (2006) found no lineage-specific correlation of substitution rates across the kingdom Fungi. Similarly, Linder et al. (2011) did not find strong evidence for rate autocorrelation across plant and simian datasets, instead finding the IR model to have more explanatory power. The AR model is not immune to bias (Lartillot and Delsuc 2012), and has been shown to produce older estimates for simulated datasets across dating tools, including MCMCTree (Miura et al. 2020).The presence of short-term rate fluctuations in mammals suggest that mixed relaxed clock models accounting for both autocorrelation and jumps in rate variation are needed (Ho 2009;Lartillot et al. 2016).
Sources of error in divergence time estimation are manifold, as evidenced by the large confidence intervals in our analysis (supplementary fig. S3, Supplementary Material online). Beyond the difficulty surrounding choice and implementation of molecular clock models, a major source of error is the use of secondary calibrations-a necessity due to the general lack of fungal fossil data (Beimforde et al. 2014)-which can impact the precision and accuracy of divergence time estimates (Shaul and Graur 2002;Graur and Martin 2004;Sauquet et al. 2012;Schenk 2016). For this reason, we incorporated the error from node ages estimated using primary fossil calibrations (Lutzoni et al. 2018) using confidence intervals to provide upper and lower bounds, as recommended when using secondary calibrations (Graur and Martin 2004;Forest 2009;Hipsley and Müller 2014). An alternative approach is to expand taxon sampling until fossil data can be incorporated, although secondary calibrations have been shown to produce divergence time estimates with similar accuracy to those from distant primary calibrations, albeit with lower precision (Powell et al. 2020). Our motivation for divergence time estimation was not to test specific timedependent hypotheses, but rather to calibrate branch lengths for more realistic measures of phylogenetic distance in subsequent comparative analyses. As with any divergence time analysis, major uncertainties are still associated with the divergence times of fusarioid fungi.
All taxa had a similar number of genes, CSEPs and CAZymes, very few of which were strain-specific ( fig. 2A). It has previously been suggested that the number of species-specific secreted proteins (and by extension, we assume, effectors) is generally higher in fungal lifestyles which associate with plants without killing or decaying them, such as mutualistic symbionts and biotrophic pathogens, compared with saprotrophs and necrotrophic pathogens (Kim et al. 2016), the reasoning being that the former have to negotiate the plant-fungal interaction for an extended period. In the genus Colletotrichum, however, a reduction in the number of species-specific CSEPs was Lifestyle Transitions in Fusarioid Fungi · https://doi.org/10.1093/molbev/msac085 MBE observed alongside the transition from phytopathogenicity to beneficial endophytism , showing that CSEPs and their impact on the plant-fungal interaction can be highly lineage-specific. We saw no significant difference in the number of strain-specific CSEPs (or genes) between any lifestyles (supplementary fig. S4, Supplementary Material online). This, combined with the fact that plant pathogens are often also reported as endophytes and vice versa ( fig. 2A), and that plant pathogen and endophyte strains were not significantly different in  . 2B), suggests that fusarioid taxa have a shared genetic capacity for phytopathogenicity and/or endophytism. Having a high proportion of species-specific CSEPs has also been associated with the connected factor of host specialisation (Spanu et al. 2010), which, considering we report very low numbers of strain-specific genes, may also explain the status of many Fusarium taxa as host generalists. Our results were also similar to those comparing pathogenic and nonpathogenic taxa in another genus of broad generalists, Aspergillus (Mead et al. 2021).
We did not identify common genetic signatures for the endophytic lifestyle in terms of gene, CSEP or CAZyme content, reinforcing the current understanding that there is no universal "toolkit" associated with the endophytic lifestyle Knapp et al. 2018). This contrasts with other well-defined lifestyles such as that of mycorrhizal fungi, for which specific genetic features have been associated with lifestyle in both ascomycetes and basidiomycetes (Martin et al. 2010;Delaux et al. 2013;Kohler et al. 2015;Peter et al. 2016;Miyauchi et al. 2020;Rich et al. 2021). One observed hallmark of the transition to mycorrhizal symbiosis is the loss of genes encoding PCWDEs (Kohler et al. 2015;Peter et al. 2016;Miyauchi et al. 2020), but, as we found here, these are retained in various endophytic taxa (Zuccaro et al. 2014;Lahrmann et al. 2015;Hacquard et al. 2016;Franco et al. 2022;Mesny et al. 2021). As PCWDEs have often been treated predominantly as features of saprotrophy, this has fed into the hypothesis that many endophytes are latent saprotrophs, but in a broad comparison of CAZymes across the Dikarya, Zhao et al. (2013) demonstrated that plant pathogens have on average more CAZymes belonging to typical PCDWE families than saprotrophs. As there was no significant difference in total number or repertoire of CAZymes between plant pathogens, endophytes and saprotrophs, it indicates that fusarioid fungi retain the same machinery for plant cell wall degradation and/or remodelling, regardless of lifestyle. We did, however, find a significant difference in CSEP content between saprotrophs and endophytes ( fig. 2B), which could suggest that fusarioid endophytes are more likely to be latent pathogens than saprotrophs.
The major exception to the apparent lifestyle flexibility among fusarioid fungi is the insect mutualist lifestyle, which formed a monophyletic group (the ambrosia clade) in Neocosmospora ( fig. 2A). The insect mutualist lifestyle was also the most distinct in terms of gene and CSEP content, being significantly different from all other lifestyles apart from the mycoparasitic lifestyle ( fig. 2B), but the Lifestyle Transitions in Fusarioid Fungi · https://doi.org/10.1093/molbev/msac085 MBE very small sample size for the latter will have impacted the test's power in that case (Alekseyenko 2016). The transition to symbiotic mutualism in Neocosmospora was not associated with a reduction in total number of genes, CSEPs or CAZymes, in agreement with results from other ectosymbiotic insect mutualists (Biedermann and Vega 2020). As the representative strains used in this study are all known to cause disease on the trees they colonise with their beetle partner O'Donnell et al. 2016;Na et al. 2018;Aoki et al. 2019), it follows that they would have retained many of the genetic mechanisms from their (presumably) plant associated ancestors. Some strains have been found to cause disease in vitro in the absence of their beetle partners (e.g., Eskalen et al., 2012;Na et al., 2018), however, to our knowledge, fusarioid ambrosia fungi have never been reported as free-living in the wild.
Although we did not identify significant differences in the genetic repertoires between fusarioid endophytes and plant pathogens, we did find some evidence that copy number variation-genes or regions that are either duplicated or deleted in reference to other taxa-may be contributing to lifestyle. There was no significant difference in mean gene copy number between lifestyles, but plant pathogens included extreme outliers in gene copy number compared with other lifestyles (fig. 2C). Extensive gene duplication has been suggested as a key strategy for pathogenicity in basidiomycete rusts (Pendleton et al. 2014), and copy number of the pectin degrading CAZyme subfamily PL1_7 across 41 rootcolonising fungi was shown to correlate with pathogenicity in Arabidopsis (Mesny et al. 2021). Gene duplication is regarded as the primary resource for the evolution of functional novelties, and the persistence of gene duplicates is indicative of neofunctionalisation and/or subfunctionalisation, as a functionally redundant gene copy will be rapidly lost due to the absence of selective pressure to retain it (Lynch and Conery 2000;He and Zhang 2005). The most common functional innovations of gene copies in fungi are regulatory changes (Wapinski et al. 2007). Indeed, copy number variation is known to be correlated with differential gene expression (Stranger et al. 2007;Steenwyk and Rokas 2018;Shao et al. 2019), and has been shown to contribute to phenotypic or pathological differences in fungi (Steenwyk et al. 2016;Zhao and Gibbons 2018).
This aligns with mounting evidence that a major factor impacting lifestyle of closely related phytopathogens and endophytes is not gene repertoire itself, but expression profiles. Returning to Colletotrichum, Hacquard et al. (2016) found that a pathogenic taxon had a different pattern of gene expression during host colonisation, including upregulation of CSEPs, compared with a closely related and genetically similar beneficial endophyte. The authors noted that this also makes the beneficial endophyte genetically capable of reverting to pathogenicity (and, presumably, the closely related pathogens capable of inhabiting plants as endophytes). The aforementioned CAZyme subfamily PL1_7, which we found between 2 and 4 copies of in all fusarioid taxa (supplementary fig. S7, Supplementary Material online), was also more highly expressed in the pathogenic Colletotrichum taxon. The importance of expression has already been seen in Fusarium, where expression of secondary metabolites differed between endophytic and pathogenic strains of the same species, F. annulatum (as F. proliferatum, FFSC), despite generally sharing secondary metabolite gene clusters (Niehaus et al. 2016). Generating in planta expression profiles for both pathogenic and non-pathogenic strains across the group could reveal whether there is convergence in expression patterns for certain lifestyles.
Regulation of certain genes located on accessory chromosomes has also been seen to direct plant infection phenotypes in an endophytic versus pathogenic FOSC strain (Guo et al. 2021). Accessory chromosomes-chromosomes that are not essential for survival, but potentially confer functional advantages (Bertazzoni et al. 2018)are likely another important factor impacting lifestyle in Fusarium. The first acc. chromosomes in fungi were discovered in the fusarioid species Neocosmospora haematococca (as Nectria haematococca) (Coleman et al. 2009), with further reports in at least nine other fusarioid strains (Bertazzoni et al. 2018). They have mostly been studied in the FOSC, in which horizontal transfer of acc. chromosomes can confer pathogenicity (Ma et al. 2010;Li et al. 2020). Not only are acc. chromosomes deemed to be a key innovation for rapid adaptation by plant pathogens (Croll and McDonald 2012) they have also been implicated in adaptation of FOSC strains to human pathogenicity (Zhang, Yang, et al. 2020). Exploring the extent of acc. chromosomes broadly across fusarioid fungi, as well as phenomena impacting genomic architecture such as transposable elements (Muszewska et al. 2019), may shed light on the mechanisms underlying lifestyle flexibility in the group (Ma et al. 2013).
As effectors are highly diverged and often lineagespecific, if not strain-specific, only a small proportion of the CSEPs predicted here could be matched to experimentally verified genes from PHI-base. Of these, the majority were genes known to impact virulence to some degree or not at all in the hosts they have been tested on (supplementary fig. S6, Supplementary Material online), although the knockout mutant phenotype for a certain gene will not necessarily be the same for different fungal strains or on different hosts. PHI-base is also explicitly dedicated to pathogen-host genes, and similar high quality, curated resources are needed for genes involved in non-pathogenic fungal-host interactions. Nonetheless, our results give us a broad perspective on CSEP distributions across fusarioid fungi. Some CSEPs exhibited phylogenetic patterns (such as lower copy number in Fusarium s. lat. compared with Fusarium s. str. for MoCDIP4, which was first discovered in Magnaporthe oryzae (Chen et al. 2013) and since reported in F. oxysporum f. sp. pisi (Achari et al. 2021 Hill et al. · https://doi.org/10.1093/molbev/msac085 MBE A slightly lower proportion of core CSEPS were found to be positively selected than non-CSEPs according to dN/dS calculations ( fig. 3A). This may be seen as surprising, as effectors that promote virulence are assumed to be under strong selective pressure during the evolutionary arms race between fungus and host (De Jonge et al. 2011;Lo Presti et al. 2015). For instance, CSEPs have been found to more frequently be under positive selection compared with non-CSEPs in phytopathogenic Microbotryum species (Beckerson et al. 2019). High rates of selection on CSEPs are not only a hallmark of pathogenicity, however, as these have also been observed for obligate, host-specific Epichloë endophytes (Schirrmann et al. 2018); the arbuscular mycorrhizal fungus Rhizophagus irregularis (Schmitz et al. 2019); and the saprotroph Verticillium tricorpus , emphasizing the broader roles played by effectors in host-fungal interactions. Our results could be explained by the fact that we focused on core genes, and so the CSEPs in questions are presumably contributing to integral host-fungal interactions that would be under similar selective pressure as other core functions, rather than specialised CSEPs more likely to be under strong selective pressure from the host. We should also note that detection of positive selection with dN/dS methods is biased against shorter genes (Derbyshire et al. 2021), which CSEPs by definition are, and so this may have impacted our results.
We identified five cases where positive selection of core CSEPs and CAZymes may be connected to lifestyle by comparing aBSREL analysis of positive selection on external branches to Contrast-FEL analysis of relative selection pressures between lifestyles. Interestingly, there were no core CSEPs with higher selective pressure in plant pathogens relative to other lifestyles, which could be interpreted as evidence that the ancestral state of the group is phytopathogenic rather than endophytic, but the unbalanced sample sizes for the different lifestyles will have influenced the Contrast-FEL results. Once again, the insect mutualist lifestyle was shown to be distinct, with a greater number of sites per gene undergoing higher selective pressure relative to other lifestyles ( fig. 3B). This may be associated with the fact that these ambrosia taxa have evolved via insect farming, in what could be interpreted as some level of "artificial selection" (Mueller et al. 2005). We were only able to tentatively link the positive selection of one core CAZyme to lifestyle: 5 ′ -oxoaverantin cyclase in the AA3_2 subfamily, which was positively selected for in the insect mutualist N. oligoseptata ( fig. 3A). Other members of the same subfamily are implicated in lignin degradation (Levasseur et al. 2013;Miyauchi et al. 2017), but 5 ′ -oxoaverantin cyclase was first identified as an intermediate in aflatoxin biosynthesis in Aspergillus parasiticus (Sakuno et al. 2003). Another insect-fungus mutualism between the navel orangeworm and A. flavus has shown that aflatoxin tolerance is a key adaptation of the insect to its fungal diet (Niu et al. 2009;Ampt et al. 2016), and as fusarioid fungi are known to produce an array of mycotoxins (Desjardins and Proctor 2007), it would be interesting to determine whether there is a similar dynamic in the evolution of the ambrosia mutualism.
Conventional dN/dS methods to detect selection such as aBSREL and BUSTED make the assumption that synonymous substitutions are always selectively neutral, but we now know that selection does occur on synonymous mutations (Ohta 1996;Chen et al. 2004;Hershberg and Petrov 2008). Subsequently dN/dS methods have been shown to overestimate the frequency of positive selection and underestimate the strength of negative selection in bacteria, even when selection on synonymous sites is weak (Rahman et al. 2021). Furthermore, using dN/dS . 1 as a signifier of positive selection has been declared arbitrary (Tamuri and dos Reis 2021). As flexible dN/dS methods accounting for selection on synonymous substitutions have yet to be integrated into the widely used tools for detecting positive selection, this remains a caveat of our dN/ dS analyses. Additionally, even a low incidence of sequence inaccuracies can results in false-positive signals of selection (Mallick et al. 2009), so ideally candidate genes should be resequenced to detect errors and confirm whether sites are truly under selection. A further limitation of the selection analyses is that they were restricted to core genes due to the requirement of a robust species tree to estimate dN/ dS across lineages, which necessarily excludes a large proportion of the gene content (Derbyshire et al. 2021). Further exploration of selection dynamics in the extensive accessory content would undoubtedly shed more light on the evolution of the group.
When exploring the issue of selection on synonymous substitutions, we showed that codon optimisation of the core single-copy genes-that is, the extent of translational selection on codon usage-was higher in CSEPs and CAZymes than other genes ( fig. 4B), as was previously found in the F. oxysporum f. sp. cepae pangenome (Armitage et al. 2018). Insect mutualists had a much larger difference in codon optimisation between CSEPs and CAZymes ( fig. 4B). One possible explanation for this result is that these taxa may have less translational selective pressure on CSEPs that are required for plant invasion-being farmed by insects which excavate and weaken the plant hosts-but retain higher translational selective pressure on CAZymes that are required for assimilation of nutrients, which ultimately maintains the insect-fungus mutualism. Following this broad perspective on codon optimisation, further functional annotation could allow the use of a "reverse ecology framework" to explore whether genes with the highest codon optimisation correspond with lifestyle (LaBella et al. 2021).
We also found that correlation between lifestyle range and codon optimisation was not significant after correcting for phylogenetic relationships (supplementary fig. S8, Supplementary Material online), contrary to expectation from previous studies (Botzman and Margalit 2011;Badet et al. 2017). Our approach to assess lifestyle range was limited by the availability of published reports of fusarioid taxa, and so we will undoubtedly have underestimated the number of lifestyles exhibited by some species.
Lifestyle Transitions in Fusarioid Fungi · https://doi.org/10.1093/molbev/msac085 MBE Furthermore, fusarioid species are often hard to distinguish, and lifestyle reports may therefore be misattributed. To mitigate against this issue, we only included studies that used appropriate genetic markers to distinguish taxa-not, for instance, solely using ITS (Geiser et al. 2004)-and crosschecked phylogenetic analyses for misclassifications. Despite this, we may have inadvertently included lifestyle reports for species that were incorrectly classified in the original study. A comprehensive meta-analysis is needed to better understand the extent of lifestyle and host range for fusarioid taxa.
A major caveat of our comparative analyses is that we were forced to attribute a single lifestyle to the strains being used, despite the current understanding, which our own results support, that these lifestyles are not necessarily mutually exclusive (Selosse et al. 2018). Furthermore, treating lifestyles as categorical traits does not accurately reflect the range of outcomes we know can exist within even one lifestyle, such as different pathogenic strains within the same species varying in "aggressiveness" (e.g., Holtz et al. 2011;Chen et al. 2014;Šišić et al. 2018). These both remain central issues with current approaches to fungal lifestyle comparison at large (e.g., Knapp et al. 2018;Miyauchi et al. 2020;Mesny et al. 2021;Franco et al. 2022). New methods that can effectively incorporate multiple lifestyle hypotheses, or treat lifestyles as points on a continuous spectrum, are sorely needed to encapsulate the nuance of these highly context-dependent interactions.

Conclusions
We found an apparent shared genetic capacity for phytopathogenicity and endophytism in Fusarium, which suggests that, while strains may be reported as plant pathogens or endophytes, their lifestyle is potentially transient. Were fusarioid taxa to make the transition to obligate, mutualistic endophytism, we might expect to see genetic hallmarks more akin to those seen in the transition to obligate symbiosis in mycorrhizal lifestyles (e.g., Delaux et al., 2013). Despite multiple reports of certain endophytic Fusarium strains being beneficial to certain plant hosts (e.g., Kavroulakis et al., 2007;Mendoza and Sikora, 2009;Bilal et al., 2018), large uncertainties remain as to the stability of these interactions. Our results depict fusarioid fungi as prolific generalists and highlight the difficulty in predicting pathogenic potential in the group. Considering the importance of plant immune response, biotic and abiotic conditions to the plant-fungal interaction, such endophytes may not be the "silver bullet" for biocontrol that they are sometimes touted to be.

Materials and Methods
Genome Sequencing, Assembly, and Structural Annotation We selected five endophytic Fusarium strains for wholegenome seuence which were representatives of species hypotheses that had previously been isolated and clustered into 99% similarity operational taxonomic units by Hill et al. (2021), with taxonomic identification confirmed where possible via morphological assessment by the Westerdijk Institute (supplementary table S2, Supplementary Material  online). For DNA extractions, a fragment of mycelium from axenic cultures was transferred to 500 ml of 2% malt extract nutrient broth using a sterile needle and grown at 25°C in ambient light conditions on an orbital shaker at 120 rpm for 1 week. Mycelia were collected via vacuum filtration and frozen at −80°C before being pulverised with two sterile stainless-steel beads in a 2 ml Eppendorf using a Mixer Mill MM 400 (Retsch, Germany).
DNA was extracted using the DNeasy Plant Mini Kit (Qiagen, CA, USA) according to the manufacturer's protocol and eluted in 70 μl of TE buffer. Sufficient DNA concentration (more than 20 ng/µl) was confirmed with a Quantus ™ Fluorometer (Promega, WI, USA) and purity (260/280 absorbance ratio of approximately 1.8) confirmed with a NanoDrop spectrophotometer (Thermo Fisher Scientific, MA, USA). DNA extractions were sent to Macrogen (Macrogen Inc., South Korea) for library preparation and sequencing: library preparation was performed using the TruSeq DNA PCR-free Sample Preparation Kit with a 550 bp insert size and 151 bp paired-end reads were sequenced using the NovaSeq 6000 platform (Illumina, San Diego, CA, USA).

Phylogenomic Analyses
Predicted genes from 57 additional publicly available strains of Fusarium and allied genera were downloaded from NCBI (supplementary table S1, Supplementary Material online) and orthogroups (referred to here as genes) were inferred from amino acid sequences of the total 62 strains using OrthoFinder v2.4.0 (Emms and Kelly 2019). We aligned 1,060 core (i.e., shared between all fusarioid taxa including the outgroup) single-copy genes using MAFFT v7.310 with default settings (Katoh and Standley 2013) and removed ambiguously aligned regions using both BMGE v1.12 (Criscuolo and Gribaldo 2010) and trimAl v1.4.rev15 with the gappyout option (Capella-Gutiérrez et al. 2009) to compare the impact of trimming tools on the resulting species trees.
For a coalescent-based approach, ML gene trees were inferred from each core single-copy gene alignment with RAxML-NG using the best-fit model selected by ModelTest-NG during the concatenated analysis. The resulting ML gene trees were used for coalescent-based species tree reconstruction using ASTRAL-III v5.7.3 (Zhang, Yohe, et al. 2018) with local posterior probability (LPP) branch support estimation (Sayyari and Mirarab 2016). ASTRAL-Pro v1.2 (Zhang, Scornavacca, et al. 2020) was additionally run with LPP support estimation on the 20,343 gene trees produced by OrthoFinder, which represented both single-and multi-copy "total" genes. OrthoFinder itself also produces a coalescent-based species tree topology by default using STAG (Emms and Kelly 2018), which used 3,449 core single-and multi-copy genes. All species tree topologies were compared by computing the normalised Robinson-Foulds metric using the RF.dist function from the phangorn v2.7.0 package (Schliep et al. 2017) in R v4.0.4 (R Core Team 2020).

Molecular Clock Analyses
The species tree topology inferred by RAxML-NG was used to perform molecular clock analyses with MCMCTree (Yang and Rannala 2006) in PAML v4.9 (Yang 2007) using the top 10 "clock-like" core single-copy genes, as inferred by SortaDate based on root-to-tip variance (Smith et al. 2018). Divergence times were estimated using the approximate likelihood method (dos Reis and Yang 2011) with the WAG amino acid substitution model (Whelan and Goldman 2001). Due to the sparse fossil record for the fungi at large, a previous fossil-calibrated study of the kingdom including Fusarium species was used to inform secondary calibrations of the tree root at 0.9-1.35 (1 time unit being 100 My) and the node representing the split between F. graminearum and "F." solani at 0.5-0.9 (Lutzoni et al. 2018). For details of MCMCTree priors and run settings, see Supplementary Material.

Computational Prediction of CSEPs and CAZymes
CSEPs were identified from predicted genes using a framework inspired by Beckerson et al. (2019) and summarised in supplementary figure 9A, Supplementary Material online including the following: 1) signal peptide detection; 2) filtering for contradictory cellular localisation signals; and 3) cross-checking against machine learning-based effector prediction using EffectorP 3.0 (Sperschneider and Dodds 2021). A custom bash script, CSEPfilter, was written to perform the filtering of gene sets at each stage. To match CSEPs to experimentally verified genes, sequences were searched against the PHI-base database (downloaded 09/02/2022; Urban et al. 2020) using BLAST 2.7.1+ (Camacho et al. 2009). CAZymes were identified from predicted genes using run_dbCAN v3.0.2 (https://github.com/ linnabrown/run_dbcan) from the dbCAN2 CAZyme annotation server (Zhang, Yohe, et al. 2018) and assigned names using the ExplorEnz website (McDonald et al. 2009). For full details on both CSEP and CAZyme prediction, see Supplementary Material.
CSEPs and CAZymes were matched to gene orthogroups with a custom R script, orthogroup_parser.r, where a gene was defined as a CSEP/CAZyme if it was predicted to be so in at least one taxon. We checked that genome assembly quality did not significantly influence the number of predicted genes, CSEPs or CAZymes by confirming that there was no correlation between assembly N50 (extracted from NCBI metadata for assemblies produced outside this study) and number of genes/CSEPs/ CAZymes using the cor.test function in R.

Comparative Genomics of Lifestyle
Lifestyles of all the strains used in this study were inferred from the host/substrate and other relevant data (such as pathogenicity tests) sourced from the literature, NCBI BioSample metadata, and online culture collection metadata (supplementary table S1, Supplementary Material online). If a strain was reported from a plant host but without sufficient clarification of whether the plant was exhibiting disease symptoms or the fungus was isolated from inside plant tissues, the strain was classified ambiguously as a "plant associate". In addition to the lifestyle of the specific strains used in the analyses, other lifestyle reports were collected from the literature with the help of the PlutoF platform (Abarenkov et al. 2010) in order to show the range of reported lifestyles for taxa.
The impact of strain lifestyle on CSEP, CAZyme, and all gene content was explored using an approach developed by Mesny and Vannier (2020) which accounts for confounding phylogenetic signal. Full details are described in Supplementary Material, as well as a full description of the statistical analyses to test the significant difference in number of strain-specific genes and mean gene copy number between lifestyles.

Selection Analyses
To assess whether core single-copy genes have evolved under positive selection we used HyPhy v2.5.30 (Kosakovsky Pond et al. 2005), which offers a suite of tools for assessing selective pressures based on the ratio of nonsynonymous Lifestyle Transitions in Fusarioid Fungi · https://doi.org/10.1093/molbev/msac085 MBE to synonymous substitutions (dN/dS)-that is, the ratio of nucleotide substitutions which alter the transcribed amino acid to those that do not. Notably, this approach assumes that synonymous substitutions are selectively neutral. For full details see Supplementary Material; briefly, codon alignments and ML trees were run in BUSTED v3.1  and aBSREL v2.2  to detect episodic positive selection (dN/dS . 1). We then used Contrast-FEL to compare differences in relative selective pressures between lifestyles (Kosakovsky . Codon optimisation of all core single-copy genes to the ribosomal protein gene pool (S) was calculated from each taxon's codon adaptation index (CAI; Sharp and Li, 1987), effective number of codons (Nc) and GC3 values with the get.s function from the tAI v0.2 package (dos Reis et al. 2004). For full details, see Supplementary Material. S values were calculated for CSEP, CAZyme, non-CSEP/CAZyme, and all core single-copy genes in turn. To then assess the relationship between codon optimisation and lifestyle range, we calculated Pearson's correlation on uncorrected data using the cor.test function in R, and used PGLS regression to assess correlation while correcting for phylogenetic signal in the data with the R package nlme v 3.1-152 (Pinheiro et al. 2021). For number of reported lifestyles, only taxa identified to species level were included, and for species with multiple representative strains the mean S value was used. To visualise the relationship between codon optimisation and phylogeny, we used the ordisurf function from the R package vegan v2.5-7 (Oksanen et al. 2019) to fit S values to the PCA of phylogenetic distances produced in comparative analyses above (recreated in R with the vegan prcomp function). See Supplementary Material for full details of statistical analyses on the difference in S values between Fusarium s. str. and allied genera, as well as between gene types and lifestyles.
The uco function from seqinr v4.2-8 (Charif and Lobry 2007) was used to calculate codon usage bias in terms of relative synonymous codon usage (RSCU)-the ratio of observed codon usage to expected codon usage-for all codons across each taxon, excluding non-redundant codons encoding methionine and tryptophan and stop codons. RSCU values were then normalised using the scale function and used to produce a Euclidean distance matrix with the dist function, which was used for hierarchical clustering of taxa with the hclust function using the average agglomeration method. We compared the topology produced by hierarchical clustering with the RAxML-NG species tree topology by again computing the normalised Robinson-Foulds metric using the RF.dist function from phangorn. We calculated the P-value by computing the metric for 1,000 random trees with the same number of taxa against the species tree topology to determine the number of simulations for which the metric was lower (i.e., topologically closer) than that from the hierarchical clustering.
All results were plotted in R v4.0.4 using packages listed in Supplementary Material. Scripts of all analyses are available at https://github.com/Rowena-h/FusariumLifestyles.

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.

Data availability
WGS data and structurally annotated genome assemblies generated in this study are available on GenBank under the BioProject accession PRJNA761077. Additional data files of the raw phylogenetic trees; CSEP and CAZyme amino acid sequences; OrthoFinder output; and orthogroup metadata are deposited in Zenodo doi:10.5281/zenodo.6353640. All scripts are available at https://github. com/Rowena-h/FusariumLifestyles.